Skip to content

feat(pt_expt): add dp finetune support#5331

Merged
wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-ft
Mar 23, 2026
Merged

feat(pt_expt): add dp finetune support#5331
wanghan-iapcm merged 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-ft

Conversation

@wanghan-iapcm
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm commented Mar 20, 2026

Summary

  • Add --finetune, --model-branch, and --use-pretrain-script support to dp --pt-expt train, mirroring the pt backend's finetune flow (load pretrained checkpoint, change type map, selective weight copy, output bias adjustment)
  • Support finetuning from both .pt checkpoints and frozen .pte models (embed model_params in .pte during freeze for --use-pretrain-script)
  • Fix a bug in dpmodel's base_atomic_model.change_type_map where out_bias/out_std were not extended before remapping when the new type map introduces unseen types, causing IndexError with negative remap indices

Usage examples

# Finetune from a .pt checkpoint
dp --pt-expt train input.json --finetune pretrained.pt

# Finetune from a frozen .pte model
dp --pt-expt train input.json --finetune pretrained.pte

# Copy descriptor/fitting config from pretrained model
dp --pt-expt train input.json --finetune pretrained.pt --use-pretrain-script

# Finetune from a multi-task pretrained model (select a branch)
dp --pt-expt train input.json --finetune pretrained.pt --model-branch Default

# Re-initialize fitting net randomly (only keep descriptor weights)
dp --pt-expt train input.json --finetune pretrained.pt --model-branch RANDOM

Files changed

File Change
deepmd/pt_expt/utils/finetune.py Newget_finetune_rules() for pt_expt, supports .pt and .pte
deepmd/pt_expt/entrypoints/main.py Wire --finetune/--model-branch/--use-pretrain-script through train()get_trainer()Trainer; pass model_params to .pte during freeze
deepmd/pt_expt/train/training.py Finetune weight loading in Trainer.__init__ (.pt and .pte); model_change_out_bias()
deepmd/pt_expt/utils/serialization.py Embed/extract model_params.json in .pte archive
deepmd/dpmodel/atomic_model/base_atomic_model.py Fix change_type_map to extend out_bias/out_std for new types (array-api compatible)
source/tests/pt_expt/test_finetune.py New — 9 tests covering bias adjustment, type map change, CLI dispatch, .pte finetune, --use-pretrain-script, random_fitting, inherited weight consistency
source/tests/consistent/model/test_ener.py Add test_change_type_map_new_type verifying out_bias/out_std extension across dp, pt, pt_expt

Test plan

  • python -m pytest source/tests/pt_expt/test_finetune.py -v (9 passed)
  • python -m pytest source/tests/pt_expt/test_training.py -v (11 passed, no regression)
  • python -m pytest source/tests/consistent/model/test_ener.py -k change_type_map -v (3 passed)
  • python -m pytest source/tests/consistent/descriptor/test_se_e2_a.py -v (351 passed, no regression)

Summary by CodeRabbit

  • New Features

    • Fine-tuning workflow: supply pretrained checkpoints, select branch, and toggle pretrain-script behavior
    • Automatic expansion of atom type maps (new types get zero bias and unit std) while preserving existing mappings
    • Improved finetune resume: selective merging of pretrained descriptor/fitting weights and bias-adjustment modes
    • Export/import embeds/restores model metadata to/from artifacts
  • Tests

    • Unit and end-to-end tests for finetuning, bias adjustment, type-map expansion, and frozen-artifact scenarios

Han Wang added 3 commits March 20, 2026 23:04
Add `--finetune`, `--model-branch`, and `--use-pretrain-script` support
to `dp --pt-expt train`. The implementation mirrors the pt backend's
finetune flow: load pretrained checkpoint, optionally change type map,
selectively copy weights (descriptor always from pretrained, fitting
conditionally), and adjust output bias.

Also fix a bug in dpmodel's base_atomic_model.change_type_map where
out_bias/out_std were not extended before remapping when the new type
map introduces unseen types, causing an IndexError with negative
remap indices.
Extend the finetune flow to accept .pte frozen models as the
pretrained source, in addition to .pt checkpoints.  The .pte file
is loaded via serialize_from_file + BaseModel.deserialize to
reconstruct the pretrained model with weights.

Embed model_params in the .pte archive during freeze so that
--use-pretrain-script works with .pte sources.  Older .pte files
without embedded model_params fall back to a minimal dict with
just type_map.

Add weight consistency checks to CLI tests (lr=1e-30 to prevent
training from modifying weights) verifying descriptor and fitting
weights match the pretrained model after finetune initialization.
The DPA1 test_finetune_change_type bias-adjusted comparison failed
because the two trainers (with different type maps) sampled different
data frames for bias adjustment.  The data set has 80 frames but
data_stat_nbatch=1 sampled only 1 frame, and the frame selection
depended on numpy RNG state which differed between the two trainers.

Fix by subsampling the data to 2 frames in TestEnergyModelDPA1 and
using batch_size=2 so all frames are consumed deterministically.
@wanghan-iapcm wanghan-iapcm requested a review from iProzd March 20, 2026 16:21
@dosubot dosubot Bot added the new feature label Mar 20, 2026
Comment thread source/tests/consistent/model/test_ener.py Fixed
Comment thread source/tests/consistent/model/test_ener.py Fixed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1be2ec5ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread deepmd/pt_expt/utils/finetune.py
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 19a078e1-8268-49a2-9075-1c7f23226a7f

📥 Commits

Reviewing files that changed from the base of the PR and between 7c75b73 and 20f259c.

📒 Files selected for processing (3)
  • deepmd/pt_expt/entrypoints/main.py
  • deepmd/pt_expt/train/training.py
  • deepmd/pt_expt/utils/serialization.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • deepmd/pt_expt/utils/serialization.py
  • deepmd/pt_expt/entrypoints/main.py

📝 Walkthrough

Walkthrough

Adds pt_expt fine-tuning flow: expands type-map remap to grow per-type out_bias/out_std for new atom types, embeds/extracts model_params in frozen artifacts, introduces finetune rule extraction and Trainer logic for selective pretrained weight transfer and bias adjustment, and adds unit and e2e tests.

Changes

Cohort / File(s) Summary
Type Map Handling
deepmd/dpmodel/atomic_model/base_atomic_model.py
change_type_map now grows per-type out_bias (zeros) and out_std (ones) when new atom types are introduced before performing the existing remap.
Fine-tuning Utilities
deepmd/pt_expt/utils/finetune.py
New module providing get_finetune_rules() and helpers to detect frozen .pte/.pt2 vs .pt, extract model_params, and build/validate finetune routing (including descriptor checks and branch resolution).
Serialization Support
deepmd/pt_expt/utils/serialization.py
serialize_from_file() now returns embedded "model_params" when present; deserialize_to_file() gained optional model_params parameter to embed "model_params.json" into .pte exports.
CLI & Trainer Integration
deepmd/pt_expt/entrypoints/main.py, deepmd/pt_expt/train/training.py
Added CLI flags (--finetune, --model_branch, --use_pretrain_script), get_trainer/Trainer.__init__ accept finetune_model/finetune_links; resume rules for .pte/.pt2 vs .pt implemented; selective pretrained weight transfer (descriptor vs fitting vs _extra_state), type-map growth handling, and model_change_out_bias() helper added.
Tests — unit & e2e
source/tests/consistent/model/test_ener.py, source/tests/pt_expt/test_finetune.py
Adds unit test for change_type_map with new atom types and comprehensive model-level and CLI end-to-end finetune tests covering bias adjustment, type remap consistency, frozen .pte sources, use_pretrain_script, and RANDOM fitting behavior.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CLI (entrypoint)
    participant Config as Config
    participant Finetune as FinetuneRules
    participant Serializer as Serializer
    participant Trainer as Trainer
    participant Model as Model

    User->>CLI: dp --pt-expt train --finetune model.pte --use-pretrain-script
    CLI->>Config: load config & init model
    CLI->>Finetune: get_finetune_rules(model.pte, model_config, model_branch)
    Finetune->>Serializer: serialize_from_file(.pte/.pt)
    Serializer-->>Finetune: model data + model_params
    Finetune-->>CLI: finetune_links
    CLI->>Trainer: Trainer(finetune_model, finetune_links)
    Trainer->>Serializer: deserialize pretrained (.pte/.pt)
    Trainer->>Trainer: determine resume vs finetune rules
    Trainer->>Model: selective weight transfer (descriptor / fitting / _extra_state)
    Trainer->>Model: check finetune_links.Default.get_has_new_type()
    alt new types present
        Trainer->>Model: change_type_map(new_type_map) -> expand out_bias/out_std + remap
    end
    Trainer->>Model: model_change_out_bias(sample_func, mode)
    Trainer->>Model: start finetune training
    Trainer->>Serializer: deserialize_to_file(output.pte, data, model_params)
    Serializer-->>User: saved checkpoint (.pte) with embedded model_params
Loading
sequenceDiagram
    participant Trainer
    participant Pretrained as PretrainedCheckpoint
    participant Target as TargetModel
    participant Rule as FinetuneRule

    Trainer->>Pretrained: load weights (.pte/.pt)
    Trainer->>Rule: get_has_new_type()
    alt New Types Detected
        Trainer->>Target: change_type_map(new_type_map)
        Target->>Target: expand out_bias/out_std then remap
    end
    Trainer->>Target: copy descriptor weights from pretrained
    Trainer->>Rule: get_random_fitting()
    alt Keep Random Fitting
        Note over Target: keep random init for fitting params
    else Use Pretrained Fitting
        Trainer->>Target: copy fitting weights from pretrained
    end
    Trainer->>Target: change_out_bias(mode)
    Target->>Target: adjust bias via statistics
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • iProzd
  • njzjz
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(pt_expt): add dp finetune support' clearly and concisely summarizes the main objective—adding finetune functionality to the pt_expt training pipeline—which is the primary focus across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Around line 384-385: The current logic sets resume_model = init_model or
restart_model or finetune_model so finetune can incorrectly pick init/restart
checkpoints; change this by (a) validating inputs up front in the function that
defines init_model/restart_model/finetune_model and raise an error if more than
one of those is set, OR (b) keep the existing resume_model variable but in the
finetune branch explicitly load weights from finetune_model (not resume_model)
and use that checkpoint to populate descriptor/fitting weights and
_extra_state["model_params"]; update both the initial resume/resuming block
(resume_model/resuming) and the finetune-specific code region (~lines 487-527)
to follow the chosen approach so finetune never inherits init/restart weights.
- Around line 991-1002: The log attempts to convert CUDA tensors returned by
_model.get_out_bias() to numpy via np.asarray which raises RuntimeError on CUDA;
replace the np.asarray(...) calls with to_numpy_array(...) from
deepmd.dpmodel.common when building the log message after calling
_model.change_out_bias (and similarly anywhere else you call np.asarray on
_model.get_out_bias()), so call to_numpy_array(old_bias).reshape(-1) and
to_numpy_array(new_bias).reshape(-1) (slicing by len(model_type_map) as before)
to ensure device-safe conversion for logging.

In `@deepmd/pt_expt/utils/finetune.py`:
- Around line 35-40: The code currently falls back to returning only
{"type_map": ...} when serialize_from_file(finetune_model) lacks "model_params",
which silently allows change_model_params=True to proceed with incomplete
config; modify the logic in the finetune model-loading blocks (where
serialize_from_file, finetune_model is used and again in the block around lines
79-92) to detect when change_model_params is True and "model_params" is missing,
and immediately raise a clear error (including mention of using
--use-pretrain-script or that legacy .pte lacks model_params.json) instead of
returning the minimal dict; ensure the error path prevents calling
get_finetune_rule_single with incomplete input.

In `@source/tests/consistent/model/test_ener.py`:
- Around line 1333-1423: The test wrongly sets dp_std_orig =
to_numpy_array(dp_model.get_out_bias()) instead of snapshotting the original
out_std and then never asserts remapping for old types; change the snapshot to
dp_std_orig = to_numpy_array(dp_model.atomic_model.out_std) (and similarly
ensure any other std snapshots use atomic_model.out_std), seed a non-trivial
out_std on dp_model before change_type_map, then add assertions that the
remapped old entries land at indices 3 and 0 (compare dp_std_new[:, 3, :] to
dp_std_orig[:, 0, :] for "O" and dp_std_new[:, 0, :] to dp_std_orig[:, 1, :] for
"H"), keep cross-backend equality checks (pt_model, pt_expt_model) and remove or
use any now-unused locals; run ruff check . and ruff format . before committing.

In `@source/tests/pt_expt/test_finetune.py`:
- Around line 564-580: The loop comparing ft_state and pre_state must, when
random_fitting is True, assert that fitting tensors are not all identical:
locate the loop over ft_state and the variables ft_state, pre_state and
random_fitting; gather keys containing ".fitting" present in both ft_state and
pre_state and assert that at least one of those tensors differs (e.g., by
checking torch.any(ft_state[k] != pre_state[k]) for at least one k), failing the
test if all fitting tensors are equal.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bb57259a-acc4-47d2-bb02-4dc75c3365c5

📥 Commits

Reviewing files that changed from the base of the PR and between 91e3d62 and c1be2ec.

📒 Files selected for processing (7)
  • deepmd/dpmodel/atomic_model/base_atomic_model.py
  • deepmd/pt_expt/entrypoints/main.py
  • deepmd/pt_expt/train/training.py
  • deepmd/pt_expt/utils/finetune.py
  • deepmd/pt_expt/utils/serialization.py
  • source/tests/consistent/model/test_ener.py
  • source/tests/pt_expt/test_finetune.py

Comment thread deepmd/pt_expt/train/training.py
Comment thread deepmd/pt_expt/train/training.py
Comment thread deepmd/pt_expt/utils/finetune.py
Comment thread source/tests/consistent/model/test_ener.py Outdated
Comment thread source/tests/pt_expt/test_finetune.py Outdated
Older .pte files (or those produced by external code calling
deserialize_to_file without model_params) lack the embedded
model_params.json.  When --use-pretrain-script is used with such
files, get_finetune_rule_single would crash with a KeyError on
"descriptor".  Add an explicit check with a clear error message.
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 90.56604% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.42%. Comparing base (6122d97) to head (20f259c).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/pt_expt/entrypoints/main.py 69.23% 4 Missing ⚠️
deepmd/pt_expt/train/training.py 94.23% 3 Missing ⚠️
deepmd/pt_expt/utils/finetune.py 89.28% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5331      +/-   ##
==========================================
+ Coverage   82.40%   82.42%   +0.02%     
==========================================
  Files         783      784       +1     
  Lines       79031    79124      +93     
  Branches     3675     3675              
==========================================
+ Hits        65122    65219      +97     
+ Misses      12736    12731       -5     
- Partials     1173     1174       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Reject combining finetune_model with init_model/restart_model
- Use to_numpy_array instead of np.asarray in model_change_out_bias
  for CUDA tensor safety
- Remove unused variables dp_std_orig/dp_std_before in test_ener.py
- Add out_std remap correctness assertion for old types
- Assert fitting weights differ (not just skip) for random_fitting=True,
  excluding bias_atom_e which is set by bias adjustment
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 21, 2026
@github-actions github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Mar 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
source/tests/pt_expt/test_finetune.py (1)

122-132: Minor: Redundant import.

shutil is already imported at line 16; the local import shutil as _shutil on line 124 is unnecessary.

Suggested fix
 def _subsample_data(src_dir: str, dst_dir: str, nframes: int = 2) -> None:
     """Copy a data system, keeping only the first *nframes* frames."""
-    import shutil as _shutil
-
-    _shutil.copytree(src_dir, dst_dir, dirs_exist_ok=True)
+    shutil.copytree(src_dir, dst_dir, dirs_exist_ok=True)
     set_dir = os.path.join(dst_dir, "set.000")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/test_finetune.py` around lines 122 - 132, The helper
_subsample_data currently does a redundant local import "import shutil as
_shutil"; remove that line and use the module already imported at the top-level
(shutil) when calling copytree in _subsample_data so the function uses
shutil.copytree(dst_dir, ...) instead of the locally imported _shutil; update
references in _subsample_data to shutil to avoid the duplicate import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@source/tests/pt_expt/test_finetune.py`:
- Around line 122-132: The helper _subsample_data currently does a redundant
local import "import shutil as _shutil"; remove that line and use the module
already imported at the top-level (shutil) when calling copytree in
_subsample_data so the function uses shutil.copytree(dst_dir, ...) instead of
the locally imported _shutil; update references in _subsample_data to shutil to
avoid the duplicate import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9816e5c4-2f84-4307-b1ec-86b1aa95e00b

📥 Commits

Reviewing files that changed from the base of the PR and between 603fcd9 and cb6ab5f.

📒 Files selected for processing (3)
  • deepmd/pt_expt/train/training.py
  • source/tests/consistent/model/test_ener.py
  • source/tests/pt_expt/test_finetune.py
✅ Files skipped from review due to trivial changes (1)
  • source/tests/consistent/model/test_ener.py

…in finetune tests

Replace np.asarray() with to_numpy_array() when converting model
bias tensors to numpy arrays. np.asarray() fails on CUDA tensors
with "can't convert cuda:0 device type tensor to numpy", while
to_numpy_array() handles device transfer automatically.
@wanghan-iapcm wanghan-iapcm added the Test CUDA Trigger test CUDA workflow label Mar 22, 2026
@github-actions github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Mar 22, 2026
@iProzd iProzd added this pull request to the merge queue Mar 23, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Mar 23, 2026
# Conflicts:
#	deepmd/pt_expt/utils/serialization.py
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Mar 23, 2026
Merged via the queue into deepmodeling:master with commit 034e613 Mar 23, 2026
70 checks passed
@wanghan-iapcm wanghan-iapcm deleted the feat-pt-expt-ft branch March 23, 2026 15:30
@coderabbitai coderabbitai Bot mentioned this pull request Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants